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@ Deoxynucleotlde linkers to be attached to a cloned DNA coding sequence. 

@ The present invention is based upon a general princi- 
ple of providing specific olingonucleotide segments («link- 
ers»», herein) to be attached in sequence to a cloned DNA 
coding segment. The linkers-of the present Invention confer 
desired functional properties on the expression of the pro- 
tein coded by the coding sequence. Using linkers of the 
present invention, the desired protein may be expressed 
either as a fusion or non-fusion protein. A linker coding for 
an additional sequence of amino acids may be attached, the 

^ sequence being chosen to provide properties exploitable in 
a simplified purification process. A linker coding for an 
amino acid sequence of the extended specific cleavage site 
of a proetolytic enzyme is provided, as well as specific 

^ cleavage linkers for simple specific cleavage sites. 
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: backgroumd of the- INVEMTTOM ' 

The invention herein provides for deoxynucleotide 
sequences coding for amino acid sequences which contain 
specific cleavage sites. The deoxynucleotide sequences axe 
herein termed specific cleavage linkers and are useful in 
recombinant DNA technology. 

Recent advances in biochemistry and in recombinant DNA 
technology have made it possible to achieve the synthesis of 
specific proteins under controlled conditions independent of 
the higher organism from which they are normally isolated. 
Such biochemical synthetic methods employ enzymes and sub- 
cellular components of the protein synthesizing machinery of 
living cells, either iji vitro , in cell-free systems, or in 
viAra, in microorganisms. In either case, the key element is 
provision of a deoxyribonucleic acid (DNA) oT specific seq- 
uence, which contains the inrformation necessary to specify the 
desired amino acid sequence. 5uch a specific DNA is herein 
termed a DNA coding segment, The eoding relationship whereby 
a deoxynucleotide-sequence is used to specify the amino acid 
sequence of a protein is described brie fl y , infra , and oper- 
ates according to a fundamental set of principles that obtain 
throughout the whole of the known realm of living organisms. 

A cloned DNA may be used to specify the amino acid seq- 
uence of proteins synthesized by in vitro systems, DNA- 
directed protein synthesizing systems are well-known in the 
art, see, e.g., Zubay, G. , Ann.. Rev. Genetics 7, 267 (1 973), 
In addition single-stranded DNA can be induced to act as 
messenger RNA in vitiro , resulting in high fidelity trans- 
lation of the DNA sequence (Salas, J. et al , J. Biol. Chem . 
243, 1012 (1968). Other techniques well known in the art may 
be used in combination with the above procedures to enhance 
yields . 

Developments in recombinant DNA technology have made it 
possible to isolate specific genes or portions thereof from 
higher organisms, such as man and other mammals, and to 
transfer the genes or fragments to a microorganism, such as 
bacteria or yeast. The transferred gene is replicated and 
propagated as the transformed microorganism replicates . As 
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a result, the t ransf ormecT microorganism may become endowed 
with the capacity to make v^hatever protein the gene or frag- 
ment encodes, whether it be an enzyme, a hormone, an antigen 
or an antibody, or a portion- thereof. The" microorganism 
passes on this capability to its progeny, so that in effect, 
the transfer has resulted in. a new strain, having the des- 
cribed capability- See, for example, Ullrich, A* et al . , 
Science 1 96 1313 (1977), and Seeburg, P.H., et al*. Nature 
270 , 486 ^977).. A basic fact underlying the application of 



10 this technology for practical purposes is that DNA of all 

living organisms, from microbes to man, is chemically similar, 
being composed of the same four nucleotides. The significant 
differences lie in the sequences of these nucleotides in the 
polymeric DNA molecule. The nucleotide sequences are used 
15 to specify the amino acid sequences of proteins that comprise 
the organism. Although most of the proteins of different 
organisms differ, from each other, the coding relationship 
between nucleotide sequence and amino acid sequence is funda- 
mentally the same for all organisms. For example, the same 
20 nucleotide sequence which is ,the coding segment for the amino 
acid sequence of human growth hormdne in human pituitary 
cells, will, when transferred to a microorganism, be recog- 
nized as coding for the same amino acid sequence. 

Abbreviations used herein are given in Table 1 . 
:^^ Table 1 

DNA - 
RNA - 
cDNA 



deoxyri bo nucleic acid 
ribonucleic acid 



mRNA 
dATP 



- complementary DNA 
( enzymatically 
synthesized from an 
mRNA sequence) 

- messenger RNA 



A - 


Adenine 




T - 


Thymine 




G - 


Guanine 




C - 


Cytosine 




U - 


Uracil 




ATP 


- adenosing 


triphosphate 


TTP: 


- Thymidine 


triphosphate 



EDTA - Ethylehediamine^-; 

tetra acetic acid 



deoxyadenosine 
triphosphate 
dGTP - deoxyguanosing 
triphosphate 

dCTP - deoxyc y tidine 
triphosphate 

The coding relationships between nucleotide sequence in 
DNA and amino acid sequence in protein are collectively known 
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as the genetic code, shown in- Table' 2, 



Table 2 
Gervetic Code * 





TTt^ 




Lrf r\ r\ 


\ o 1 1 r>' 1 n P» ■ f 1 r* 1 1 V 


XTY 




PA 1 

o r\ o 




ATM 




r\ r\ r\ 


Metnionine v.Met; 


•A T n 

Alb 


Lysine (Lys) 


A A T 

AA J 


Valine (Val) 


GTL 


Aspartic acid (Asp) 


GAK 


Serine (Ser) 


QRS 


Glutamic acid (Glu) 


GAJ 


Proline (Pro) 


CCL 


Cysteine (Cys) 


TGK 


Threonine ( Thr ) 


ACL 


Tryptophan (Try) 


TGG 


Alanine (Ala ) 


GCL 


Arginine ( Arg ) 


WGZ 


Tyrosine (Tyr) 


TAK 


Glycine (Gly) 


GGL 


Termination signal 


TAJ 






Termination signal 


t;ga 







Key: Each 3-letter deoxynucleotide triplet corresponds to 
a trinucleotide of mRNA, having a 5*-end on the left and a 
-3 '-end on the right. All DNA sequences given herein are those 
of the strand whose sequence corresponds to the mRNA seq- 
uence, with thymine substituted for uracil. The letters 
stand for the purine or pyrimidine bases forming the deoxy- 
nucleotide sequence. 



A = 


adenine 


J 


= A 


or 


G 


G = 


guanine 


K 


= T 


or 


C 


c = 


cytosi.ne 


L 


= A, 


T, 


c 


T = 


thymine 


M 


= A, 


c 


or 


X = 


T or C if Y is A or G 










X = 


C if Y is C or T 










Y = 


A, G, C or T if X is C 










Y = 


Aor G ifX is T 










W = 


C or A if Z is A or G 










W = 


C if Z is C or T 










Z = 


A, G, C or T if W is C 










Z = 


A or G if W is A 










QR = 


TC if S is A, G, C or T 










QR = 


AG if S is T or C 










S = 


A, G, C or T if QR is TC 










S = 


T or C if QR is AG 
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An important feature of the code, for "present purposes, 
is the fact that each amino acid is specified by a trinucleo-* 
tide sequence, also knov/n as a nucleotide triplet The 
phosphodiester bonds joining adjacent triplets are chemically 
5 indistinguishable from all other inter nucleotide bonds in DMA. 
Therefore the nucleotide sequence cannot be read to code for a 
unique amino acid sequence without additional information to 
determine the reading frame, which is the term used to denpte> 
the grouping of triplets used by the cell in decoding the 

10 genetic message* 

In procaryotic cells, the endogenous coding segments are 
typically preceded by nucleotide sequences having the functions 
of initiator of transcription (mRNA synthesis) and initiator 
of translation (protein synthesis), termed the promoter and 

.15 ribosomal binding site, respectively-/ The coding segment 
begins around 3-11 nucleotides distant from the ribosomal 
binding site. The exact number of nucleotides intervening 
between the ribosomal binding site and the initiation codon 
of the coding segment does not appear to be critical for 

20 translation of the coding segment in correct reading frame. 

The term "expression control segment" is used herein to denote 
the nucleotide sequences comprising a promoter, ribosomal 
binding site and a 3-11 nucleotide spacer following the ribo- 
somal binding site. In r/eucaryotic cells, regulation of 

25 transcription and translation may be somewhat more complica- 
ted, but also involve such nucleotide sequences. 

Many recombinant DNA techniques employ two classes of 
compounds, transfer vectors and restriction enzymes, to be 
discussed in turn. A transfer vector is a DNA molecule 

30 which contains, inter alia , genetic information which insures 
its own replication when transferred to a host microorganism 
strain* Examples of transfer vectors commonly used in bact- 
erial genetics are plasmids and the DNA of certain bacterio- 
phages. Although plasmids have been used as the transfer 

35 vectors for the work described herein, it will be understood 

that other types of transfer vectors may be employed. Plasmid 
is the term applied to any autonomously replicating DNA unit ' 
which might be found in a microbial cell other than the 
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genome of the host cell itself. A plasmid is not genetically 

linked to t he ch romosbme of the host cell/ Plasmid DN-A's 

exist as double-stranded ring structures generally on the 

order of a few million daltons molecular weight, although 

8 

some are greater than 10 daltons in molecular weight.' They 
usually represent only a small percent of the total DNA of 
the cell. Transfer Vector DMA is usually separable from host i 
cell DNA by" virtue of the great difference in size between 
them. Transfer vectors carry genetic information enabling 
them to replicate within the host cell, in most cases 
independentlyof the rate of host cell division. Some plasmids 
have the • 'property that their replication rate can be con- 
trolled by the investigator by variations in the growth 
conditions. By appropriate techniques, the plasmid DNA 
ring may be opened, a fragment of heterologous DNA inserted, 
and the ring reclosed, forming an enlarged molecule comprising 
the inserted DNA segment. Bacteriophage DNA may carry a seg- 
ment of heterologous DNA inserted in place of certain non- 
essential phage genes. Either way, the transfer vector 
serves as a carrier or vector for an inserted fragment of 
heterologous DNA. ^ 

Transfer is accomplished by a process known as trans- 
formation. During transformation, host cells mixed with 
plasmid DNA incorporate entire plasmid molecules into the 
25 cells. Although the mechanics of the process remain obscure, 
it is possible to i^niaiximize the proportion of host cells 
capable of ^taking up plasmid DNA and hence of being trans- 
formed, by certain empirically determined treatments. Once a 
cell has incorporated a plasmid, the latter is replicated 
within the cell and the plasmid: replicas are distributed to 
the daughter cells when the cell divides. Any genetic inform- 
ation contained in the nucleotide sequence of the plasmid DNA 
can, in principle, be expressed in the host cell. Typically, 
a transformed host cell is recognized by its acquisition of 
traits carried on the plasmid, such as resistance to certain 
antibiotics. Different plasmids are recognizable by the 
different capabilities or combination of capabilities which 
they confer upon the host cell containing them. Any given 
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plasmid may be made in- quantity by grov/ing a pure culture 
of cells containirrg the plasmid and isolating the plasmid 
DNA therefrom. 

Restriction' endoriu-cl eases are hydrolytic enzymes* capable 
of catalyzing site-specific cleavage of DNA molecules. The 
locus of restriction endonuclease action is determined by 
the existence of a specific nucleotide sequence. Such a 
sequence is termed the recognition site for the restriction 
endonuclease. Restriction endonucleases from a variety of 
sources have been isolated and characterized in terms of the 
nucleotide sequence of their recognition sites* Some restric- 
ion endonucleases hydrolyze the phosphodiester bonds on both 
strands at the same point, producing blunt ends* Others 
catalyze hydrolysis of bonds separated by a few nucleotides 
from each other, producing free single! stranded regions at 
each end of the cleaved molecule. Such single stranded ends 
are self-complefmentary , hence cohesive, and may be used to 
rejoin the hydrolyzed DNA. . Since any DNA susceptible of 
cleavage by 3uch an enzyme must contain the same recognition 
site, the same cohesive ends v^ill be produced, so that it is 
possible to join heterologous sequences of DNA which have 
been treated with a restriction endonuclease to other sequ- 
ences similarly treated. See Roberts, R ^ 3 . , . Crit . . Rev , 
Biochem. . 4 , 123 (1976). Restriction sites are relatively 
rare, howe'ver the general utility of restriction endonucleases 
has been gr.eatly amplified by the chemical synthesis of 
double stranded oligonucleotides bearing the restriction site* 
sequence. Therefore virtually any segment 'of DNA can be 
coupled to any other segment simply by attaching the 
appropriate restrictiorr oligonucleotide to the ends of the 
molecule, and subjecting the product to the hydrolytic action 
of the appropriate restriction endonuclease , thereby producing 
the requisite cohesive" ends . See Heyneker, H.L. et al . , 
Nature . 263 ,: 748 (19.76) and Scheller , R . H . , et al . Science 
1 96: ,: 177 (1977). An important -feature of the distribution of 
restriction endonuclease recognition sites is the fact that 
they are randomly distributed with respect to reading frame. 
Consequently, cleavage by restriction' endonuclease may occur 
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between adjacent codons or it may occur within a codon. 

More general methods of DNA cleavage or for end 
sequence modification are available. A variety of non- 
specific endonucleases may be used to cleave DNA randofrtly, 
as discussed, infra . End sequences may be modified by 
creation of oligonucleotide tails of dA on one end and dT 
at the other, or of dG and dC, to create sites for joinirrg 
without the need for specific linker sequences. 

The term "expression*' is used in recognition of the 
fact that an organism seldom if ever makes use of all its 
genetically endowed capabilities at any given time. Even 
in relatively simple organisms such as bacteria, many pro- 
teins which the cell is capable of synthesizing are not 
synthesized, although they may be synthesized under appro- 
15 priate environmental conditions. When the protein product, - 
coded by a given gene, is synthesized by the organism, the 
gene is said to be expressed. If the protein product 'is 
not made, the gene is not expressed. Normally, the express- 
ion of genes in E.^ coli is regulated as described generally, 
infra , in such manner that proteins whose function is not 
useful in a given environment are not synthesized and meta- 
bolic energy is conserved. 

The means by which gene expression is controlled in. E. 
coli and yeast is well understood, as the result of extensive 
25 studies over the past twenty years. See, generally, Hayes, 
W. The Genetics of. Bacteria And Their Viruses ,. 2d edition, 
John Wiley & Sons, Inc., New York (1968), and Watson, J.D., - 
The Molecular Biology /of the Gene , 3d edition, Benjamin, 
Menlo Park, California (1976). These studies have revealed 
that several genes, usually those coding for proteins- carrying 
out related functions in the cell, may be found clustered 
together in continuous sequence. The cluster is called an 
operon. All genes in the operon:. are transcribed in the same 
direction, beginning with the codons coding for the N-term- 
inal amino acid of the first protein in the sequence and cont- 
inuing through to the C-terminal end of the last protein of 
the operon. At the beginning of the operon, proximal to the 
N-terminal amino acid codon, there exists a region of the DNA, 
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termed the control region, which includes a variety of 
controlling elements including the operator, promoter and 
sequences for the binding of ribosomes. The function of 
these sites is to permit the expression of those genes under 
their control to be responsive to the needs of the organism. 
For example, those genes coding for enzymes required 
exclusively for utilization of lactose are normally not 
appreciably expressed unless lactose or an analog thereof is 
actually present in the medium. The control region functions 
that must be present for expression tb occur are the initiation 
of transcription and the initiation of translation. The 
minimal requirements for independent expression of a coding 
segment are therefore a promoter, a ribosomal binding site, 
and a 3-11 nucleotide spacer segment. The nucleotide 
3 5 sequences contributing these functions are relatively short',- 
such that the major portion of an expression control segment 
might be on the order of 15 to 25 nucleotides in length. 
Expression of the first gene in the sequence is initiated by 
the initiation of transcription and translation at the 
20 position coding for the N-terminal amino acid of the first 
protein of the operori. The expression of each gene down- 
stream from that point is also initiated in turn, at least 
until a termination signal or another operon is encountered 
with its own control region, keyed to respond to a different 
25 set of environmental cues. While there are many variations 
in detail on this general scheme, the important fact is that, 
to be expressed in a host such as E . , coli , or a eucaryote 
such as yeast a. gene must be properly located with respect 
to a control region having initiator of transcription and 
30 initiator of translation functions. 

It has been demonstrated that genes not normally 
part of a givenoperon can be inserted within the operon- and 
controlled by it. The classic demonstration was made by 
Jacob, F., et al. , J.- Mol. Bidl .lJy 704 (1965). In that 
35 experiment, genes coding for enzymes involved in a purine 

biosynthesis pathway were transferred to a region controlled 
by the lactose operon. The expression of the purine bio- 
synthetic enzyme was then observed to be repressed in the 
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absence of lactose or a 1 ac tose anal og , and vyas rendered 
unresponsive to the environmental cues normally regulating 
its expression. 

In addition to the operator region' regulating the 
initiation of transcription of genes dov/nstream from it, 
there are known to exist codbns which function as stop sig- 
nals, indicating the C-terminal end of a given protein* See 
Table 2. Such c.odons are known as termination signals and 
also as nonsense codons, since they do not normally code for 
any amino acid. Deletion of a termination signal between 
structural genes of an operoa creates a fused gene which 
could result in the synthesis of a chimeric or fusion protein 
consisting of two amino acid sequences coded by adjacent 
genes, joined by a peptide bond. That such chimeric: pro- 
teins are syn^thesized when genes are' fused was demonstrated 
by Benzer, S., and Champe, S . P > . Proc . Nat > Acad> Sci USA. 4S[^ 
114 (1962). . — 

Once a given gene has been isolated, purified and 
inserted in a tran^sfer vector, the over-all result of which 
is termed the cloning of the gene, its availability in 
substantial quantity is assured. The cloned gene is trans- 
ferred to a suitable microorganism, wherein the gene repli- 
cates as the microorganism proliferates and from which the 
gene may be reisolated by conventional means. Thus is prov- 
ided a continuously renewable source of the gene for further 
manipulations, modifications and transfers to other vectors 
or other loci within the same vector. 

Expression has been obtained in the prior art by trans- • 
ferring the cloned gene, in proper orientation and reading 
frame, into a control region, such that read-through from the 
host gene results in synthesis of a chimeric protein comp- 
rising the amino acid sequence coded by the cloned gene. 
Techniques for constructing an expression transfer vector 
having the cloned gene in proper juxtaposition with a control 
region are described in Polisky, B. , et al „ Proc. Nat > Acad . 
Sci USA 73, 3900 (1976); Itakura, K. , et al.. Science 198, 
1056 (1977); Villa-Komarof f , L., et al. , Proc. Nat. Acad.- Sci 
75, 3727 .(1978); Mercereau-'Pui jalon , 0., et al. Nature 
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, 273 505 (1 978) ; Chang, A.C. Y..,' et al\:, / Na'tur^ ■2T5 -,. 617 
(19.78),. and in copending U.S. Application Serial No, 
9 33,035 by Rutter, et al^, filed August 11 ,. 1978, said 
application incorporated herein by reference as though set 
5 forth in full . 

As described in Serial No. 933,035, the cloned gene is 
joined to a host control fragment in order to. obtain 
expression of the gene. This control fragment ma y consist 
of no more than that part of the control region providing 
10 for initiation of transcription and initiation of trans- 
lation, or may additionally include a portion of a struc- 
tural gene, depending on the location of the insertion site. 
Thus, the expression product would be either a protein coded 
by the cloned gene, hereinafter referred to as a non-fusion 
15, . protein, or a fusion protein coded in part by the procaryotic 
structural gene, in part by the cloned gene, and in part by 
any intervening nucleotide sequences linking the two genes. 
The peptide bond between the desired protein or peptide, 
comprising the C-terminal portion of the fusion protein, and 
20 the remainder, is herein termed the "junction bond*J. 

After the protein has been produced, it must then be 
purified. Several advantages and disadvantages exist for 
the purification of either the non-fusion protein or the 
fusion protein- The non-fusion protein is produced within 
25 the cell. As a consequence, the cells must be lysed or 

otherwise treated in order to release the non-fusion protein. 
The lysate will contain all of the proteins of the cell in 
addition to the non-fusion protein, which may make purifi- 
cation of the protein difficult. Another consequence is that ^ 
30 the non-fusion protein may be recognized as a foreign protein 
and undergo rapid degradation within the cell. Therefore 
non-fusion proteins may not be obtainable in reasonable 
yields. A major advantage of a non-fusion protein is that 
the protein itself is the desired final product. 
35 The stability of the' expression product is frequently 

enhanced by expression of a fusion protein. The host portion 
of the fusion protein f requently stabilizes the expression 
product against intracellular degradation. Further^ it is 
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often- possible to choose a host protein which is protected 
from degradation- by compartmentalization or by excretion 
from the cell into the growth medium. The cloned gene can 
then be attached to the host gene for such a protein. A 
fusion protein consisting of an excreted or compartmental- 
ized host protein (W-terminal) and an eucaryotic protein 
(C-terminal), is likely to be similarly excreted from the 
cell or compartmentalized within it because the signal 
sequence of amino acids that confers secretabili ty is orr the 
N-terminal portion of the fusion protein. In the case of a 
fusion protein excreted into- the cell medium, purification is 
greatly simplified. in some instances, the host portion may 
have distinctive physical properties that permit the use of 
simple purification procedures. A major disadvantage of the 
fusion protein is that the host protein must be removed fram 
the fusion protein in order for the eucaryotic protein to be 
obtained. 

Direct expression as a non-fusion protein will generally 
be preferred if the protein is stable in the host cell. - In 
many instances, the disadvantage of having to purify the 
expression product from a cell lysate will be overcome by 
the advantage of not having to employ specific cleavage 
means to remove an N-terminal portion. Most advantageously, 
as provided herein by the present invention, the desired 
protein may be expressed as a fusion protein comprising an N- 
terminal sequence having distinctive physical properties 
useful for purification and provided with a structure at the 
junction point With the desired C-terminal portion such that 
the junction bond, as defined, supra , can be cleaved by 
means which dp not appreciably affect the desired C-terminal 
protein or peptide. 

Many methods for chemical cleavage of peptides have been 
proposed and tested.- Spande, T.r., et al,. Advv: Protein Chem , 
24, 97 (1970). However, many of these are non-specific, i.e. 
they cleave at many sites in a protein. See also a brief 
discussion in The Proteins . 3rd Ed., Neurath. H. and Hill, 
R.L., -Ed. Academic Press, Vol. 3, pp. 50-57 (1977). 
Hydolysis of peptide bonds is catalyzed by a variety of known 
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proteolytic enzymes. See* . T he' En z yme s , 3rd Ed., Bo'yer, P . D\ , J 
Ed., Academic Press, Vol. Ill (1971)'; M.ethQds' in Enzymology , i 
Vol. XIX, Perlmann, G.E. and Lorand, L. Ed*, Academic Press | 
(1970); and, Methods' in Etfzymol bg y',^ Vol . XLV,' Lorand L., Ed., | 
5 Academic Press (1976). However, many proteolytic enzymes | 
are also non-specific, v^ith respect to the cleavage site. 5 
The specificity of each chemical or enzymatic means for • ,2 
cleavage is generally described in terms of amino acid | 
residues at or near the hydrolyzed peptide bond. The hydro- 

10 lysis of a peptide bond in a protein or polypeptide is herein ^ 
termed a cleavage of the protein or polypeptide at the site || 
of the hydrolyzed bond. The peptide bonds which are hydro- :1 
lyzed by chemical or enzymatic means are generally known. 
(See the above-identified references). For example, trypsin 

15. (3.4.4.4) cleaves on the carboxyl side of an arginine or 

lysine residue. (The number in parentheses aftefr -the ' enzyme 
is its specific identifying nomenclature as established by 
the International Union of Biochemists.) Thus, trypsin is 
said to be specific for arginine or lysine. Since trypsin 

20 hydrolyzes only on the carboxyl side of arginine or* lysine 
residues, it is said to havej a narrow specificity. Pepsin 
(3.4.4.1), on the other hand, has a broad specificity and 
will cleave on' the carboxyl side of most amino acids but 
preferably phenylalanine, tyrosine, tryptophan, cysteine , 

25 cystine or leucine residues. A few specif ic chemical 

cleavage reactions are known. For example, CNBr will cleave 
only at methionine residues under appropriate conditions. 
However, the difficulty with all specific cleavage means, 
whether chemical or enzymatic, which depend upon the exist- 

30 ence of a- single amino acid residue at or hear the cleavage 
point is that such methods will only be useful in specific 
instances where it is known: that no such residue occurs 
internally in the amino acid sequence of the desired protein. 
The larger the desired protein, the greater the likelihood 

35 that the sensitive residue will occur internally. Therefore, 
a technique generally useful for cleaving fusion proteins at 
a desired point is preferally based upon the existence of a 
sequence of amino acids at the: junction bond which has a low 
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likelihood of occurrence internally in- the desired protein. 

The specificity for the site of the hydrolyzed peptide 
bond is generally termed the primary specificity of the 
enzyme. Thus, trypsin has a primary specificity for arginine 
. and lysine residues. The primary specificity of enzymes has 
been the subject of considerable investigation. It has 
determined that a particular enzyme would recognize and bind 
the amirro acid residue within- a protein molecule corres- 
ponding to the enzyme's primary specificity and cleave the 
protein at that point. The part of an enzyme which recog- 
nizes and binds the substrate and catalyzes the reaction is 
known as the active site. For example, trypsin would recog- 
nize and bind an arginirre residue within a protein and cleave 
the protein on the carboxyl side of the arginine. For many 
years it was thought that only the amino acid residues corres- 
ponding to the primary specificity affected the specificity 
of hydrolysis of the peptide bond by the enzyme. How.ever, 
it has been noted that amino acids in the immediate vicinity 
of the site of hydrolysis may affect -the binding affinity of 
the enzyme at that site. Several examples of this effect 
can be shown for trypsin. Considering the sequence - x - 
Arg - y where x and y are amino acids, it has been found that 
the binding affinity of trypsin at the Arg-^y bond is sig- 
nificantly reduced when x I S Glu or . Asp. Similarly, it has 
been shown that the binding affinity at an arginine or 
lysine residue, in repetitive sequences of lysine, arginine- 
or combination thereof, is greater than if a single arginine 
or lysine residue were present. That is the enzyme 
preferentially binds at -Arg-Arg-X compared to y-Arg-x. 
Also, trypsin does not appear to hydrolyze the. -Arg-Pro- or 
-Lys-Pro peptide bind. See Kasper, C.B., at p. 157 in 
Protein Sequence Determination, Needleman, S.B. , Ed. 
Springer-Verlag , New York (1970). 

Recently, it has also been determined that amino acids 
in the vicinity of the site of hydrolysis will also be 
recognized and bound by the enzyme. For example, Schechter, 
I. et al. , Biochem. Bidphys. -Res.- Comm. 27:>: 157 (1 967) 
reported that papain (3.3.4-10) binds several, amino acid 
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residues in its active site as determined from" the hydro- 
lysis of peptides of various lengths'. An active site which 
binds several amino acids is often termed an extended active 
site. The specificity of an enzyme for" the' additional amino 
5 acids not at the immediate site of hydrolysis is sometimes 
termed the secondary specificity of the enzyme. It has now 
been shown that many enzymes have extended active, sites. 
Several additional example of enzymes having extended active* 
sites include: elastate (3.4.4.7) - Thompson , R . C. et al», 

10 ■ Proc. Nat. Acad. Sci. USA 67 , 1734 (1970); a-chymotrypsin 

(3.4.4.5) - Bauer, C.A., et al., Biochem .. 15, 1291 and 1296 
(1976) ; chymosin (3. 4 ♦23. 4) - Vi sser , S . , et al . , Biochem . 
Biophys. Acta . 438 265 (1976); and enterokinase (3. 4. 4*8) - 
Maroux, S., et al., J. Biol. Chem . 246 , 5031 (1971)* (See 

15 , also Frutan, J.S..,. Cold- Sprrrilg; :tiaVDb(3:r: ;Conf." ^Ofe^rl- :Pro'llf . 2/ 
33 (1975).) The extended active site appears to at least 
increase the catalytic efficiency of the enzyme. It may 
also increase the binding affinity of the enzyme for the 
peptide. See Fruton, J . S > , supra . For example, Schechter, I. 

20 et al., Biochem. Biophys. Res.: Comm . 32, 898 (1969) found 
that the phenylalanine in the sequence -x-Phe-y-z where x, 
y and z are amino acids enhances the susceptibility of the 
peptide to hydrolysis by papain and directs the enzymatic 
attack at the y-z peptide bond. Valine and leucine may also 

25 provide similar results when substituted for Phe in the above 
sequence. This could be an explanation for the broad 
specificity of papain. See Glazer, A.N. et al at p. 501 in 
. The Enzymes , supra. Thus, an enzyme may have a narrow 
specificity as a result of its primary specificity alone or 

30 in combination with its secondary specificity (i.e., the 
enzyme has an extended active site). 

The present invention provides for the procaryotic or 
eucaryotic expression. of a cloned coding segment such that 
the desired protein is produced, either as a fusion protein 

35 or a non-fusion proteirt, as desired, and may be provided with 
specific additional amino acid sequences to permit specific 
cleavage at the junction bond of a fusion protein and to 
permit rapid purification:. The ge;neral invention provides a 
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number of options for the investigator, depending on the size 
and function of the desired protein, and upon the relative 
advantages of expression as a fusion or non-fusion protein, 
according to principles well known in the art,- as discussed 
■ supra . 

To provide gen:erally useful means for specific cleavage 
of the junction bond, a chemical or enzymatic cleavage means- 
having a narrow specificity will not be suitable except in 
special cases. A cleavage means is not suitable if its 
cleavage site occurs within the eucaryotic protein of the 
fusion protein. For example, a eucaryotic protein may contain 
several arginine and/or lysine residues. Trypsin would cleave 
on the carboxyl side of these residues. Since cleavage would 
occur within the eucaryotic protein, trypsin would not be 
suitable for use for the present invention. This is also true 
for many chemical cleavage means. Thus, it can be seen that • 
in order to obtain more specific cleavage, it may be 
necessary to utilize a cleavage means which will have a 
cleavage site in a specific amino acid sequence having two 
or more amino acid residues./ For example, it would be 
desirable for the cleavage "means to be specific for an amino 
acid sequence - X - y - z - and to cleave on the carboxyl side 
of the z residue. The probability of a similar sequence 
occurring within the eucaryotic protein would be very small. 
Therefore the probability of cleavage within the eucaryotic 
protein would also be very small. The entire eucaryotic 
protein can then be removed and purified. 

The present invention is designed such that, when a 
fusion protein is expressed, a specific cleavage sequence of 
one or more amino, acids is inserted between the host portion 
and the eucaryotic portion of the fusion protein. If the 
sequence of the eucaryofcic portion is known, it is possible to 
select a specific cleavage sequence of only one amino acid 
residue so long as that residue does not appear in the 
eucaryotic protein. It is preferred, however, to utilize a 
specific cleavage sequence which contains two or more amino ' 
acid residues sometimes referred to herein as an extended 
specific cleavage sequence.. This type of sequence take; 
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advantage of the exterrded active sites of various enzymes. 
By utilizing .an extended specific cleavage sequence, it is 
highly probable that cleavage will only occur at the 
desired site, the junction bond, and not within the desired 
5 protein* The present invention is important in recombinant 
DNA technology* By inserting a specifically recognized amino 
acid sequence between the host protein portion and the 
desired portion of a fusion protein, it is now possible to 
specifically cleave the desired portion out of the fusion 

10 protein without further affecting the desired portion. 

For practical purposes, as contemplated by the present 
invention, the specificity of cleavage at the junction need 
not be all or nothing with respect to other potential cleav-. 
age sites in the desired protein. It suffices if the junction 

15: bond cleavage site is sufficiently favored kihetically , 

■ either due to increased binding affinity or enhanced turnover 
time, that the junction bond is cleaved preferentially with 
respect to other sites, such that a reasonable yield of the 
desired protein can be obtained. Reaction conditions of 

20 temperature, buffer, ratio of enzyme to substrate, reaction 

time and the like can be selected so as to maximize the yield 
of the desired protein, as a matter of ordinary skill in the 
art. 

One enzyme which may cleave at a specific cleavage site 
25 has been called a signal peptidase. For several eucaryotic 
and procaryotlc proteins , the initial translation product 
is not the ptrotein itself, but the protein with approximately 
20 additional amino acids on the amino terminus of the 
protein. The additional amino acid sequence is called a 
30 signal peptide. The signal peptide is thought to be a spec-; 
ific signal for the vectorial transport of the synthesized 
protein into the endoplasmic reticulum and is cleaved away 
from the protein during this phase. See BlobeT, G. et al, 
. 3. Cell Biol .. 67,: 835 (1975). A specific cleavage enzyme, 
35 i.e., signal peptidase, has been observed in a cell-- free 

system which hydro 1 yzes the peptide bond between the signal 
peptide and the active protein in association with passage 
through a cell membrane. See BlobeT, G . , et al . > . ProC Nat 
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■ Acad . Sci USA 15 , \ 361 '(19^.78>. 

The present invention provides for the synthesis of a 
specific cleavage linker which can be attached to" the end 
of the isolated DNA segment coding for the N-terminus of 
the protein prior to insertion of the segment into the 
transfer vector. The specific cleavage linker codes for an 
amino acid sequence which contains a specific cleavage site 
which does not occur within the desired protein. Thus, the 
specific cleavage within the linker amino acid sequence 
..esults in the isolation of the desired protein from the 
fusion protein. An advantage of the present invention is the 
cleavage at the amino-terminal side of the first amino acid 
of the N-terminus of the desired protein. Another advantage 
is that little of the desired protein is degraded .daring the 
cleavage procedure. 

For the purpose of providing expression as a non-fusion 
protein, the present invention provides synthetic oligo- 
nucleotide linkers comprising a promoter, a ribosomai binding 
site, and a 3-11 nucleotide spacer. This linker, coupled 
with a coding segment, provides for direct expression of the 
coding segment when inserted into a transfer vector and used 
to transform a suitable host. . Using such a linker, the 
coding segment may be expressed even though inserted in a 
"silent" region of the vector, thus increasing the range of 
choice of suitable insertion sites. Preferably, direct 
expression of the coding segment is obtained without resort- 
ing to a^ synthetic promoter segment. A ribosomai binding 
site linker, together with a 3-1T nucleotide spacer, directs 
the reinitiation of translation of mRNA initiated at a 
naturally occurring promo ter site. Therefore, as lon^g as 
the coding segment and expression linker are inserted in a 
transfer vector gene under naturally occurring promoter 
control, reinitiation at the inserted ribosomai binding site 
results in direct expression; of the attached coding segment.'- 
Most preferably, the insertion is made adjacent to the exist- 
ing promoter, between it and the structural gene it normally 
controls. 

For the purpose of improving purification of the fusion 
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or non-fusion protein , the present •inventron' provides a 
linker coding for amino acid sequences which function to 
enhance ease of puri ficationrV For example ^ a polyanionic 
amino acid segment or a polycationic or hydrophobic segment 
5 will be tightly bound by a variety of known solid phase 
adsorbents or column materials^ Specific amino acid 
sequences recognizable by specific binding substances can be 
incorporated on either end of the desired protein to render 
it purifiable by affinity chromatography • Such purification 

10 segments can be used in conjunction with a specific cleavage/ 
segment to provide for simple quantitative purification of 
fusion or non-fusion proteins followed by specific cleavage 
of the purification segment and quantitative removal thereof. 
The foregoing purposes are achieved in the present 

15: invention according to the properties of each system, to I 
solve the individual problems presented in preparing the 
desired protein. The principles of the present invention 
as discussed herein provide generally applicable means for 
expressing a coding segment as a fusion protein, or a non- 

20 fusion protein, with ox: without a purification segment, 

specifically cleavable from any protein or peptide not part 
of the desired expression product 

SUMMARY OF .THE INVENTION . 
The present invention is based upon a general principle 

25 of providing specific oligonucleotide segments ("linkers", 
herein) to be attached in sequence to a cloned DNA coding 
segment. The linkers of the present invention confer 
desired functional properties on the expression of the. 
protein coded by the coding sequence. Using linkers of the 

30 present invention, the desired protein may be expressed 

either as a fusion or non-fusion protein. A linker coding 
for an additional sequence of amino acids may be attached, 
the sequence being chosen to provide properties' exploitable 
in a simplified purification process. ' A linker coding for an 

35 amino acid sequence of the extended specific cleavage site of 
a proteolytic enzyme is provided, as well as specific cleav-^ 
age linkers for simple specific cleavage sites. 

The oligonucleotide linkers used are termed "segments" 
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herein. Thus, the oligonucleotide coding for a specific 
cleavage site is termed a specific cleavage segment; that • 
coding for initiation of transcription and translation is 
termed an expression control segment ; that coding for re- 
5 initiation translation is termed an expression segment; and 

that coding for specific purification is termed a purification 
segment.. The cloned nucleotide sequence coding for the 
desired protein is termed the coding segment.- The 
expression product is a protein or polypeptide bearing 
ia vasious identifiable portions; where the desired protein or 
peptide is expressed as a fusion protein, the N-terminal 
amino acid sequence contributed by the host or transfer 
vector genome is termed the host portion.^ : where a specific 
cleavage linker has been employed, the amino acid sequence 
351 -. resulting from its expression is termed the specific cleav-^. 
age portion; and where a purification segment has been 
attached, its expression product is termed the purification 
portion; that portion coded by the cloned coding segment is 
termed the desire'd protein, which term will be used herein to 
20 denote any size of polypeptide, polyamino acid, protein or 
protein fragment specified by the coding segment.- 

It is contemplated that the linkers of the present 
invention may be attached to either end of the coding seg- 
ment, to provide the desired portion at either the amino end 
25 or the carboxyl end of the desired protein. It will be 

understood that for the expression of any portion attached 
to the carboxyl end of the desired protein, the coding 
segment must not contain a termination codon. It will . 
further be understood that linkers designed for the • 
30 expression of a portion attached to the caftjoxyl end of the 

desired protein must include a termination codon, appropriate-- 
ly located at the end of the segment whose expression is 
desired . 

The present invention opens a variety oT options for 
35 the expression of a cloned coding segment/ depending on the 
. properties of the desired protein and of the host expressing 
it.' The host -may be feither procaryotic or eucaryotic. 
Where the desired protein, is small or unstable in the host, 
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it may be preferred to express a fusion protein. The use 
of a specific cleavage linker of the present invention v^ill 
then enable the subsequent specific removal of the host 
portion of the fusion protein^ It may be further desired to 
include a purification segment thereby providing a region of 
the fusion protein conferring functional properties exploit- 
able to provide simplified purification prior to specific 
cleavage. Following specific cleavage, the' purification 
portion remains attached to the host portion and simplifies 
the separation of the host portion from the desired protein • 
In some instances, it may be preferable to express the 
desired protein as a non-fusion protein. In that case, the 
use of an expression segment or an expression control segment 
linker conveniently provides for direct expression of the 
coding segment.* It will, be understood that such direct 
expression depends upon the existence of an inititation >: 
codon. If the initiation codon is not included in the 
coding segment, it can be provided as part of the express- 
ion segment. Where an N-terminal methionine is not 
desired, a specific cleavage segment may be interposed bet- 
ween the initiating methionine codon and the coding segment. 
A purification segment linker may be included to provide for 
rapid purification of the expression product. Alternatively, 
the use of an expression segment linker may be followed by a 
linker coding for a signal peptide which can cause the 
secretion of the expression product from the host cell. 

The particular combination of linkers chosen to aid in 
the expression of a given desired protein will depend upon the 
nature of the desired protein and upon functional properties 
of the expression system. Some of the described linkers are 
appropriate for procaryotic and eucaryotic hosts, while others 
are specific for a particular type of host cell.: Such choices 
will be made as a matter of ordinary skill*: Other combina^ 
tions of the described linkers not specif ically disclosed 
herein are contemplated as within the scope of the present 
invention. 

. DETAILED DESCRIPTION -OF. -THE; INVENTION ■ 
The specific cleavage linkers are deoxynucl eo tide 
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sequences coding for amino acid sequences v;hich containr' 
specific cleavage sites. A specific cleavage linker is 
attached to a coding segment prior to its transfer to a 
microorganism. The advantage of a specific cleavage 
5 linker is that it provides a specific cleavage sequence 

having a specific cleavage site at the junction bond of the 
fusion protein* This bond can be cleaved to produce the 
desired protein . 

Using current recombinant DNA technology, it is poss- 

10. ible to insert an isolated coding segment into a transfer 
vector, transform a microorganism with this transfer vec- 
tor, and under appropriate conditions have the coding seg- 
ment expressed by the microorganism. Frequently it ds 
desirable to connect the coding segment to a portion of a 

] 5; host gene, which codes for a protein that is rrormally 

excreted from the cell. This is done so that the express- 
ion product, a fusion protein comprising a host protein 
portion and the desired protein, is compartmentaiized or 
excreted from the cell into the culture medium. This 

20. process is desirable because it reduces or eliminates the 

degradation of the desired protein within the cell* In the 
case of a fusion protein excreted into the culture medium, 
it is easier to purify the fusion protein. The fusion 
protein is easier to purify because there is less total 

25 protein in the culture medium than in a whole cell lysate, 
A separate advantage of fusion protein expression is 
that there are frequently well-known means for purifying 
the host portion. Such means will often be applicable to 
the fusion protein as well. Affinity chromatography is 

30 especially preferred, where applicable. 

The major diffidulty encountered with this process is 
the need to remove the desired protein from the host portion 
in the fusion protein. This step is required in order to 
purify the desired protein. This is difficult because 

35 there is usually not a specific cleavage site located 

between the amino terminus of the desired portion and the 
carboxy terminus of the host portion which can be attacked 
uniquely by specific chemical or' enzymatic means* Thus, 
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the present- invention provides for the incorporation of a 
specific cleavage sequence between the desired protein and 
the host portion of the fusion protein. 

There are many methods for cleaving proteins as- dis- 
5 . cussed above. Examples of chemical means .include cyanogen 
bromide (CNBr) and h ydr oxylami ne . See Spande , T . P. , e t al * , 
supra^ Examples of proteolytic enzymes include trypsin, 
papain, pepsin, thrombin (3.4.4»13) and enterokinase . See : 
The Proteins, supra, Meth.. Enzymolo Vol» XIX, supra , and 

10 > Meth . Enzymol , , Vol. XLV, supra . However, many of these 

means do not show enough specificity to be useful for the 
present invention^ That is, many of these means only recog- 
nize a specific amino acid residue and cleave at this point. - 
Thus, except in very few situations, these same means will 

15: cause cleavage to occur within the desired protein. 

The present invention undertakes to create a sitaation 
for protein similar to restriction endonucleases for DNA» 
As discussed above, a restriction enzyme will recognize a 
specific sequence of DNA and cleave the DNA at this point. 

20 The present invention provides for a specific amino acid 

sequence containing one or more amino acid residues which is 
recognized by a particular chemical or enzymatic cleavage 
means. The specific amino acid sequence is incorporated 
into a fusion protein between the host portion and the 

25 desired protein. This is accomplished by chemically 

synthesizing a deoxynucleotide sequence which codes for the 
specific amino acid sequence. This DNA sequence is then 
attached to an isolated gene prior to its incorporation in 
a transfer vector. This DNA sequence is herein termed a 

30 specific cleavage linker. The specific amino acid sequence 
is herein termed a specific cleavage portion. The specific 
cleavage portion contains a specific cleavage site. The 
specific cleavage portion is selected so that it does not ■ 
or is unlikely to occur within the desired protein. In 

35 this manner, the desired protein is separated from the host 
portion of the fusion protein without itself being degraded. 

In selecting a specific cleavage sequence, several 
factors must be considered. If the amino acid sequence of 
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the desired protein is known, it is a fairly simple matter 
to select a specific cleavage sequence. In this case it is 
preferred that the specific cleavage sequence not be found 
within the desired protein-. For example, human proinsulin 
does not contain any methionines. Therefore, methionine 
could be selected as the specific cleavage sequence. If the 
DNA sequence coding for methionine (ATG) were attached to 
the isolated human proinsulin gene prior to insertion in a 
transfer vector, the fusion protein produced upon expression 
could be treated with CNBr under appropriate conditions to 
cleave human proinsulin from the host protein. See Konigs- 
berg, W*H. et al. at p» 2 in. The Proteins supra .- Similarly, 
human proinsulin does not contain the sequence X-Phe-Arg-Y. 
The enzyme kalikrein B (3.4.21.8) recognizes this sequence 
and cleaves on the carboxyl side of the arginine. See 
Fiedler, F. at p. 289 in Meth. Enzymol' ./, Vol. XLV^. supra .. 
Thus, by attaching the DNA sequence coding for Phe-Arg (TTK 
WGZ) to the isolated human proinsulin gene prior to insert- 
ion, the fusion protein produced upon expression could be 
cleaved with kallikrein B to obtain human proinsulin. Thus, 
when the desired proteiin sequence is known, it is possible 
to select any amino acid sequence as the specific cleavage 
sequence which is specifically recognized by a chemical or 
enzymatic cleavage means and does not appear in the desired 
protein sequence. 

Selecting the specific cleavage sequence is more 
difficult where the amino acid sequence of the desired 
protein is unknown. In this case, it is preferred to use 
a sequence having at least two amino acid residues. The 
greater the number of amino *acid residues in the specific 
cleavage sequence, the more unlikely the probability of a 
similar sequence occurring within the desired profeein. This 
would increase the probability of uniquely cleaving the 
desired protein from the host portion. When at least two 
amino acid residues are required for the specific recog- 
nition site, the preferred cleavage means is enzymatic. One 
possible chemical means which could be used is hydroxylamine . 
Hydroxylamine cleaves the -Asn-Z-bond where Z may be Gly, 
Leu or Ala. The rate of hydrolysis of Z=Gly is much faster 
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than for Z=Leu or Ala. See Konxgsberg, V/.H/ et al, supra. 

Another factor which can effect the selection of the 
specific cleavage sequence is the rate of hydrolysis of a 
particular cleavage means for similar amino acid sequences. 
For example, enzyme A recognizes and cleaves on the carboxyl 
side of C or D in the following amino acid sequences: 
-A-B-C- or -A-8-D-. However, the rate of hydrolysis of the 
former is much greater' than that for the latter. Assume 
_A-B-C- is chosen as the specific recognition sequence and 
-A-B-D- exists in the protein. By exhaustive hydrolysis with 
enzyme A it is possible to get cleavage on the carboxyl 
side "C" and on the carboxyl side of "D". However, the rate 
of hydrolysis for A-B-C- is much greater than that for 
-A-B-p- so that most of the initial cleavages will occur in 
A-B-G-, i.e., on the carboxyl side of C. Therefore, a select- 
ive cleavage at the desired site can be achieved by resorting 
to a partial hydrolysis. Although the yield may be reduced, 
it should still be significant enough to warrant the use of 
enzyme A in this situation. However, this situation is not 
the preferred one. 

The extended active site is the most important factor 
to consider in selecting the appropriate enzyme. The enzyme 
must be able to recognize at least two amino acid residues 
and preferably more than two. This will decrease the 
probability of cleavage within the desired protein as 
discussed above. For example, an enzyme which recognizes 
the amino acid sequence -X-Y-Z- and cleaves on the carboxyl 
side of Z would be useful for the present invention. An enzyme 
which recognizes a sequence of several amino acids but may 
cleave on the carboxyl side of two different amino acids when 
substituted in the sequence may also be useful if the rates 
of hydrolysis for the two are different as discussed above. 
An enzyme which cleaves in the inner part of the specific 
cleavage sequence would also be usfeful when used in conjunc- 
tion with specific aminopeptidases . For example, an enzyme 
which recognizes the amino acid sequence -A-B-C-D- and cleaves 
on the carboxyl side of B would be useful when used in 
conjunction with an aminopeptidase which would specifically 
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cleave C-D Trom* the refTiainder of the' desired protein. This 
enzyme would also be useful if C-D- is the N-terminus of 
the desired protein. 
1 It is contemplated that any enzyme v/hich recognizes a 

5 specific sequence and causes a specific cleavage canr be 
J utilized for the present invention. This specific recog- 

nition and cleavage may be the functi'on of the enzyme under 
; its normal enzymatic conditions or under special restricted 

? conditions. For example, it has been shown that Aspergillo- 

10 peptidase B has a very narrow spefcificity at Q*=*C, whereas it ' 
has a fairly broad specificity at 37^C. See Spadari, S. et 
. al., Biochem. Biophys. Acta 359 , 267 (1974). The following 

enzymes are examples of enzymes which are "expected to be 
useful for the present invention: ent erokinase , kallikrein B 
/ 15 .or chymosin. Enterokinase recognizes the sequence X-(Asp)^- - 

Lys-Y where n=2-4 and cleaves on the carboxyl side of Lys. 
The rate of bihding increases by 10-20 times as n increases 
from 2 to 4, as shown by studies with synthetic peptides. 
' See Maroux, S. et al . , supra It has receatly been determined 

20 that Glu or a combination of Asp and Glu can be substituted 
\: for the Asp and that Arg can be substituted for Lys, See 

Liepnicks, J., Ph.D. Thesis, Purdue University (1978). 
Kallikrein B recognizes the sequence X-Phe-Arg-Y and cleaves 
on the carboxyl side of Arg. See Fiedler, F. . supra . Chymosin 
25 recognizes the sequence X-Pro-His-Leu-Ser-Phe-Met-Ala-Ile- Y 
and cleaves the Phe-Met bond. See Vesser, S. et al . supra , 
and Vesser, S. et al . , Biochim. Biophys, Acta 481 , 1 71 (1977). 
Two other enzymes which should prove to be useful once their 
extended active sites have been studied thoroughly are uro- 
30 kinase (3.4.99.26) and thrombin. Urokinase has been shown to 
•recognize and cleave only an Arg-Val bond found in the seq- 
uence X-Arg-Val-Y of plasminogen. See Robbins, K.C., et al., 
. J.; Biol . Chem .- 242 ,: 2333 (1967 ) . Thrombin cleaves on the 
carbdxyl side of Arg but will only cleave at specific 
35 arginyl bonds. It has been shown that the sequerrce X-Phe-(Z) 

6 

Arg-Y where Z can be any combination of amino acids is pres- 
ent in several of the substrates for thrombin. See Magnusson, 
S. , -at p. 277 in T he" Eazyme's , Vol. I II , supra . / 
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Another enzyme v/hich may be useful is- the "signal 
peptidase". See Blobel, G,, supra , and Jackson, R,C. et al.,- 
. Proc.- Nat.' Acad.. Sci.: USA .74,. 5598 ,(1 9.77).. This' enzyme 
^recognizes and cleaves the~signal peptide from a protein. 
5 By incorporating the signal peptide betv/een the desired 

protein and the host portion of the fusion protein, specific 
cleavage may be accomplished during secretion of the fusion 
protein from the host to yie:}.d the desired protein. 

Any chemical or enzymatic means which recognizes a 
1Q specific sequence and causes a specific cleavage can be 

utilized for the present invention. First the appropriate 
cleavage means for a particular desired protein is chosen. 
Then a DNA sequence is chemically synthesized vihich codes for 
the specific amino acid cleavage sequence dictated by the 
appropriate cleavage means. Th'e DNA sequence is syhthesized 
by the phosphotriester method as described by Itakura, K. et • 
. al,'. 3. Biol. Chem .. 230 , 4591 (1975), and Itakura, K. et al , 
. J. Am. Chem. ,5oc . 97, 75I5~(1975) pr other suitable synthetic means. For 
example , where enterokinase is the cleavage means, a DMA 
2Q sequence which codes for Asp-Asp-Asp-Asp-Lys - as an example ^ 
is synthesized. This DNA sequence v/ould then be GAK^GAK^GAK^ 
GAK^AAJ^. A preferred. DNA sequence will be based upon a 
consideration of the codons preferentially . employed in the 
host cell. For example. In E. coli , the preferred DNA 
25 sequence would be GATGATGATG AT AAA . DNA coding for a desired 
protein is isolated using conventional techniques, such as 
the cDNA technique. See, for example, Ullrich, A. et al, 
. supra , and Seeburg, P.H. et al ,\ supra . The chemically 
synthesized DNA sequence is then attached to the isolated 
30 DNA by DNA ligase- catalyzed blunt end ligation as 

described by Sgaramella , V. et al , . Proc. ; NatV Acad . : Sci. USA : 
67,: 1468 ^(1970) • This specific cleavage lin:ker-gehe DNA 
is" then treated by addition of a second deoxynucl.eotide 
sequence, containing a restriction site and attaching this 
35 second sequence to the specific cleavage linker-gene DNA 
by DNA ligase-catal yzed blunt end ligation. Restriction 
site linkers and their use have been described by Heyneker, 
H.L.,' et -al ,.. supra ,' and by ScheTler , R .*L et al, supra. 
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Such restriction site linkers are modified according to the 
present invention to provide 0, 1 or 2 additional deoxynucl- ; 
eotides* The latter deoxynucl eotides provide Tor all three 
reading frames • Alternatively, -linkers could be synthesized 
5 which contain a restriction linker, 0, 1 or 2 additional 

deoxynucleotides and a specific cleavage linker. This compo- 
site linker could then be attached to the isolated coding 
sequence by a single blunt end ligation step. Or, two DNA 
sequences could be synthesized - one containing a restriction 

10 linker and 0, 1 or 2 deoxynucleotides and the other containing 
the specific cleavage linker. These two sequences dould be 
joined by blunt end ligation and then attached to the iso- 
lated coding sequence by blunt end ligation. The final prod- 
uct, i.e., restriction linker-0, 1 or 2 deoxynucleotides - 

15: specific cleavage linker - DNA coding sequence is then 

inserted in a transfer vector using conventional techniquest. 
It will be understood in the art that the foregoing steps of 
blunt end ligation will attach the linker sequences at both 
ends of the coding segment. However, as the latter will 

20 contain or will be provided with a termination codc^t^f the 
coding sequences at tached downstream , in the direction of 
translation from the termination codon, will remain un- 
translated. A microorganism can then be transformed with 
the transfer vector and expression of the gene is obtained 

25 under appropriate conditions. Techniques for accomplishing 
the above are more fully described in copending application 
of Bell et al, Serial No. 75,192 filed September 12, 1979 
and copending application of Rutter et al , Serial No» 
933,035, filed August, 11, 1978, both incorporated herein 

30 by reference. The fusion product resulting from expression 
is puri f led , i^r ef erabl y as described infra , and subjected to 
cleavage by the selected means. 

Purification segments coding for amino acid sequences 
that* contribute ease of purification can be included as 

35 linkers such that the added purification portion is on the 
N-terminal side of the junction bond and thereby removed 
following specific cleavage. Such linkers may be separately 
ligated or incorparat ed. with other linker segments in a 
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single composite linker. 

The kinds of amino acid sequences that contribute 
ease of purification include polyanionic segments (Asp/ 
Glu >5_^2^ and pol ycatioaic segments (Lys-Arg >^._2q that ;will 
bind readily to ion exchangers. A polyanionic segment can 
serve a dual function as an enterokinase extended site 
sequence if provided with a C-terminal lysine or arginine 
residue. A hydrophobic segment may be (leu/il eu/val/phe ) 

More specific, single step purification, can be achieved by 
the use of affinity chromatography. In principle, the 
affinity adsorbent could bind any part of the expressed protein, 
Preferably, the specific binding is directed toward that 
portion destined to be remo^fled from the desired protein. 
Given a fusion protein, the specific affinity could be an 
15 immunochemical binding of the procaryotic portion. Alterna- 
tiv.eiy, the specificity could be provided by the purification 
segment. For example, a linker segment coding for bradyki.nin 
would be incorporated to provide the bradykinin sequence as 
part of the fusion protein. An immunoadsor bent specific for 
20 bxadykiniTi (comprishg bradykinin antibody) then specifically 
binds the fusion protein. The desired protein is then 
removed from the adsorbed complex by specific cleavage, the 
unwanted portion remains adsorbed and is readily separated. 
Other examples will be apparent to those ordinarily skilled 
25 in the art. Providing a highly hydrophobic purification seg- 
ment also permits rapid and specific separation, by adsorp- 
tion to hydrophobic (reverse phase) solid phase carriers, by 
selective precipitation, and by differential solubility in 
non-aqueous media. 

A special case of purification linker involves incorp- 
orating the signal peptide sequence in the expression product.- 
The amino acid sequences of known signal peptides are 
sufficiently short to make feasible the synthesis of linkers 
coding therefor. Since the signal peptide is functional as an 
35 N-terminal peptide, its use will be in conjunction with 
dixect expression of the desired protein as a non-fusion 
protein, as de-scribed infra . Furthermore, the use of a specific 
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cleavage linker v*vill be unnecessary, since signal 'peptides 
are normally removed from the desired protein product by a 
signal peptidase endogenous in the host cell. Therefore, 
the use of a signal peptide linker can result -in secretion 
5 of the desired protein and removal "of the signal peptide, 
mediated by endogenous host functi-ons. 

The appropriate use of linkers according to the pres- 
ent invention provides means for expressing a coding segment • 
as a non-fusion protein* The required linker for such 

10. direct expression is an expression control segment comp- 
rising a promoter sequence, a ribosomal binding site sequence, 
and a spacer of about 3-11 nucleotides. Any coding segment '■ 
providing an initiation codon (ATG) within a distance of 3-11 
nucleotides from the ribosomal binding site sequence will be 

15* .expressed in correct reading . frame. ^ It is not necessary to 
provide a coding segment having ATG as its -5' end, provided 
the ATG sequence is located within 3-1 1 nucleotides distance 
from the ribosomal binding site of the linker* "An example of 
a procaryotic ribosomal binding site would have the following 

20. sequence in its plus.; strand: L(n) TAGGAGGAGCC, where L is 
A, T, C or G, and n may be 0,1 or 2. For convenience, DNA 
sequences are designated by the plus strand. However, it 
will be understood that all such linker segmentsi also have a 
minus strand of complementary base sequence and opposite , 

25 polarity. The foregoing sequence includes the following 
elements: a ribosomal binding site sequence substantially 
homologous with the 3 '-end of the 16S ribosomal RNA, as 
shown by Shine and Dalgarno , . Proc . . Nat /' Acad.' Sci > USA , 71 : 
1342 (1974), and by Steitz and Jakes, Proc. Nat.- Acad.. ScT , 

30 USA 72. 4734 (1975). . The ribdsomal binding sites so far 

studied are variable, in their degree of homology with thei 
1 6S- ribosomal RNA sequence. The maximum number of comple- 
mentary bases so far found is seven. The above described 
sequence contains six. The above-described sequence also 

35 contains a stop codon (TAG) which is designed to prevent 

read-through translation of any message initiated elsewhere. 
In order that the stop codon be in phase with the message 
to be terminated, the. sequence is provided, with 0, -1 or 2 
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additional nucleotides • The inclusion of a termination 
co.don may not. be necessary in some instances. A universal 
terminator providing termination in all three phases is 
provided by the sequence TAGLTAGLTAG. The above-described 
ribosomal binding site segment also contains a Bam HI linker 
sequence, GGATCC. The linker is useful for attaching 
additional sequence material ta the ribosomal binding site 
segment, for identifying DNA sequences into which the 
linker has been introduced, and in some instances for 
inserting the ribosomal binding site linker^ 

For. joining the ribosomal binding site segment to the 
coding segment, a spacer sequence of 3-11 base pairs is 
desired. This can be done most conveniently by blunt end 
ligation of one of the commercially available restriction 
site linkers (Scheller et al , su pr a" , ) These linkers can be 
modified as desired by treatment with the appropriate- 
restriction endonuclease followed by filling or trimming 
the unpaired ends thus produced to provide the desired 
spacer sequence. For example, the. Eco RI linker GGAATTCC 
can be treated with endonuclease Eco RI followed by DNA 
polymerase to fill in the unpaired end to provide the 
sequence AATTCC. The ribosomal binding site sequence bear- 
ing a Bam HI linker sequence is similarly treated with Bam HI 
endonuclease and DNA polymerase such that its structure is 
now L (n) TAGGAGGATC. Blunt end ligation provides the sequence 
L(n) TAGGAGGATCAATTCC. If a coding segment having a term> 
inal ATG initiation codon is attached, the initiation codon 
will be eight base pairs from the ribosomal binding site* 
The function of a ribosomal binding site linker w±ll 
vary depending upon the chosen insertion site in the transfer 
vector : If the insertion interrupts a normally translated 
message, the ribosomal binding site linker is likely to serve 
as a reinitiation point for transcription. However, the 
efficiency of translation may be improved by making the 
insertion at a site adjacent to an existing, known pro- 
moter, in the direction of normal transcription. For 
example , insertion at a site adjacent to the promoter of 
tryptophan operon will result 'in direct translation of the 
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inserted segment in place of the^ normally' expressed proteins 
of the tryptophan operon, under control of the tryptophan 
promoter. it -is desired to insert 'the coding segment in 

a silent region of the transfer vector,' it- will be necessary 
5 to provide a promoter sequence to ensure proper initiation 
of transcription. 

Sequences which can function as initiators'- of pro- 
caryotic transcription are known. See for' example Pribnow, 
D.,, Proc, Nat .. Acad. Sci . USA 72 784 Cl?75) . For example, 
10 the sequence TATJATJ, where J is A or G, appears to provide 
promoter function. In eucaryotes the sequence TATAAA, or 
similar sequences TATAAT, TATAAG are found in the region 

prior to transcription initiation and are likely to be part 
of a promoter region. See Gaunon, F., et al,, Nature , 27 8 , 

15. . 428-34 (1979). However, other nucleotides out side the 

described sequence can modify its efficiency of prompter 
function in ways which are not presently predictable* There- 
fore, while it is presently feasible to provide an expression 
control segment linker comprising both a synthetic promoter 

20 and synthetic ribosomal binding site segments, it is pre- 
ferred to employ naturally occurring promoters, either 
separately cloned or by insertion adjacent thereto. 

A ribosomal binding site linker suitable for expression 
in eucaryotic cells is provided by a segment homologous to 

25 the terminal sequence of the 18S ribosomal RNA found in 

eucaryotfes^ Hagenbuchle, et al , Cell . 13 , 551 (1978). The 
sequence GGATCCTTCC can be synthesized^simply by joining the 
sequence TTCC to the 3 '-end of the commercially available 
■ Bam HI linker. The resulting sequence GGATCCTTGC . has . eight 

30 bases complementary to the 18S ribosomal RNA sequence, and 
should therefore provide an excellent initiation site for 
translation. Techniques similar to those previously dis- 
closed may be employed to provide the requisite spacer 
nucleotides • In addition, the disclosed eucaryotic ribosomal 

55 binding site sequence can be joined to itself by blunt end 
ligation to provide two ribosomal. binding sites, one 
adjacent to the initiation codon, the other ten base pairs 
away* Similarly, the procaryotic ribosomal binding site 
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linker previously described can be employed as a spacer. 
The latter additionally provides a termination codon^ should 
it prove desirable to prevent read-through translation* 

A more complete appreciation of" the invention' will be 
5 realized by reference to the following specific examples* 
Enterokinase and human proinsulin will be used in these 
examples for illustration purposes only* These examples are 
not intended to limit the invention disclosed herein except 
to the extent to which limitations appear in the appended 
10 claims* References to a procaryotic host such as. E^. coli 

is made for convenience in the examples* The linkers of the 
present invention are also used for expression by a eucaryotic 
host following generally the principles of the invention and 
applying ordinary skills in the art. 
15. Example- 1 ' 

This example describes the preparation of a cloned 
human proinsulin gene, synthesis of a specific cleavage 
linker and the joining of the two. 

An isolated and purified (hereinafter "cloned") DNA 
20 sequence coding for human proinsulin is prepared as des- 
cribed in copending application Serial No. 75,192. 

Enterokinase is chosen as the specific cleavage means. 
The specific cleavage sequence for enterokinase is NH2-Asp- 
Asp-Asp-Asp-Lys-COOH * The DNA sequence of the plus strand 
25 coding for this amino acid sequence is 5 ' -GATG ATG ATGATAAA-3 ' - 
(The plus strand is defined as the strand whose nucleotide 
sequence corresponds to the mRNA sequence. The minus strand 
is the strand whose sequence is complementary to the mRNA 
sequence). This DNA sequence is the specific linker sesquence 
30 and is chemically synthesized using the phosphotr iest er 
method described by Itakura, K. , et al , supra * 

The foregoing sequence is then blunt end ligated to the 
commercially available Hin di I I linker which, when cleaved 
with Hin d 1 1 1 endonuclease yields a specific cleavage linker 
35 suitable for insertion at a Hin di I I site. The nucleotide 
sequence of both strands of the product linker is 

AGCTTGGATGATGATGATAAA 
ACCTACTACTACTATTT 
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By conventionv the upper strand is the' plus strand and is 
shown with ^the 5 '-end to the left, the 3^-end to the right, 
the lower strand having the opposite polarity. Expression 
in edfther of the other two reading frames is provided by 
5 prior modification of the Hin di II linker,* either by the 
removal of one of the -3^ terminal G's, or by addition of 
an extra 3' terminal G. The resulting sequence of the 
composite linker will be one nucleotide less or one nucleo- 
tide more, respectively, to provide for expression of the 

10 specific cleavage site sequence and the coding segment to 
which it is attached in correct reading frame. 

The specific cleavage linker is blunt-end ligated with 
the cloned human proinsulin gene to produce a deoxy nucleotide 
sequence of the plus strand containing: -5 ^ - Hin di II linker- 

15 . specific cleavage linker-human proinsulin gene-3 * • 

Example 2 

This example describes the cloning of the deoxynucleo- 
tide sequence from Example 1 into a suitable expression plas 
mid and the expression of said coding sequence. 

20- The specific cleavage linker-human proinsulin gene is 

inserted in an expression, transfer vector. When insertion 
occurs in the correct orientation with respect to initiation 
of translation at the insertion site, and the insert is in 
reading frame phase with the promoter and ribbsome binding 

25 site, the protein . product of the cloned coding segment i« 

synthesized by actively metabolizing host cells transformed 
by the transfer vector. 

When the cloned DNA coding segment codes for a peptide 
or small protein, it is preferable that the expression trans 

30 fer vector contains a portion of a procaryotic. gene between 
the promoter and the insertion site. The protein product in 
this instance is a fusion, protein. The fusion: protein 
tends to stabilize the foreign protein coded by the inserted 
gene in the intracellular milieu of the host. Excretion of 

35 the fusion protein from the host cell may also be accomplish 
ed by fusion with certain excretable . host proteins, such as 
B-lactamase . 

Expression plasmids have been developed wherein 
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expression is controlled by the Mac prombter (Itakura, K., et 
al, Science 2M, 1056 ( 1 977 ), Ull rich , A., et al , Excerpta 
.Kedi_ca, (1979TT and by the B-lactamase promoter, (copending- 
application of Baxter et al , Serial No. 44,637, filed June 1 
5 1979, incorporated herein by reference). 

The preferred method of constructing an expression 
plasmid is to chemically synthesize a DNA sequence containing 
a restriction site found within the 3-lactamase gene and 
n deoxynucleotides where n=d, 1 or 2 in order to provide a. 
10 proper reading frame. This sequence is then blunt-end liga- 
ted to the modified human proinsulin gene prepared in Example 
1. This new DNA sequence and the transfer vector is then 
treated with the same restriction enzyme. See Heyneker, H.L. 
et al., supra, and Scheller, R.H. et al., supra . The new DNA 
15.. sequence is then ins&rted into the transfer vector which is 
used to transform a host microorganism. A general inserted 
DNA sequence of the plus strand in accordance with the present 
invention can. be shown as follows: -5 '-restriction linker 
~ ''n^m - .specific cleavage linker -cloned gene-3» where b an 
□ c may be any deoxynucleotide base and n and m are integers 
such that n + m = 0, 1 or 2. 

Expression is detected by measurement of a product 
capable of binding immunochemicall y with anti-insulin anti- 
body or anti -proinsulin antibody. Fusion proteins indicative 
5 of expression are detected by comparing molecular weights of 
the host protein contributing the N-terminal part of the 
fusion protein in host cells transformed by expression plas- 
mids with and without an insert. 

The fusion protein for this specific example, having the 
I formula X-Asp-Asp-Asp-Asp-Lys-Y , where X is a portion of the 
3-lactamaoe protein and Y is the human proinsulin protein, 
is purified using conventional techniques. The fusion pro- 
tein is cleaved using enterokinase following the procedure 
as described by Liepnceks, supra . Cell' electrophoresis 
is conducted to determine whether proper cleavage is 
obtained. Human proinsulin serves as the standard. Two 
bands are obtained from the cleavage product, one which 
migrates with the human, proinsulin standard. Human, proinsulin 
is then purified using conventional techniques. 
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A specific purification linker is provided by modify- 
ing the linker described in. Example- 1 having the sequence 
■5 '-GATGATGATGATAAA-3 ' The sequence is modified at t:-.e 3'- 
end by providing a C or preferably a T residue in place of 
the G. The modification can: be accomplished by the use of 
T^ DNA polymerase in the presence of ATP and CTP to remove 
the 3'-terminal G, foilov/ed by nuclease to remove the -5 * - 
terminal C on the complementary strand. A C or preferably a 
T may be added to the 3 '-end, either by enzymatic or chem- 
ical mearrs. The resulting seque.nce codes for the amino 
acids Asp Asp Asp Asp Asn . The modified nucleotide sequence is 
then^ coupled by blunt end ligation to its u nmo di f i ed" homol og 
to yield 5'-GATGATGATGATAATGATGATGATGATAAA-3 '. 

The foregoing sequence is then connected to a Hin d 1 1 1 
linker as described in Example 1 , and further cohhected with 
a coding segment as described in Example 1 ♦ 

When expressed as a fusion protein, as described in 
Example 2, the linker will provide that the fusion protein 
contains a polyanionic portion of significant, length. The 
fusion protein will therefore bind tightly to anion exchange 
materials such asdiethylaminoethyl cellulose > even under 
conditions of ionic strength where substantially all other 
proteins in the ceil lysate are eluted. 

The fusion, protein: is then either eluted from the ion 
exchanger or treated in situ with enterokinase . In the latt- 
er case, preferential cleavage occurs at the junction bond 
and the desired protein: is released from the ion exchanger. 
The procaryotic portion:, bearing the polyanionic portion, 
remains bound to the ion. exchanger . When the fusion protein 
is eluted from the ion: exchanger prior to enterokinase treat-^ 
meat, incubation with enterokinase will cleave the junction 
bond pre ferential ly qnd the procaryotic portion may be rem- 
oved from the reaction- mixture by preferential binding to an 
ion exchanger, as before. By the foregoin^g procedure, sub- 
stantially quanti tativer purification of the desired protein 
is achieved in two steps. 

' Example 4 . 
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In^ this- example, the' expression of a coding sequence * 
such as that coding for human proinsulin is facilitated by 
the use of a ribosomal binding site linker- The nucleoti de ' 
sequence AGGA is synthesized chemically by the method, of 
Itakura, et al, supra '> The synthetic sequence is then joined 
chemical or by blunt end ligation to the Bam HI linker > GGATCC , 
obtained commercially from New England BioLabs, Cambridge, 
Massachusetts. The resulting segment, AGGAGGATCC, is modi- 
fied by treatment with Bam HI endonuclease followed by DNA 
polymerase. I to fill in the single stranded protruding end 
to yield AGGAGGATC, Similarly, the coding segment is treated 
first by the addition of a Bam HI linker followed by modific- 
ation of the linker with Bam HI endonuclease and DNA poly- 
merase !• The modified segments are then joined to each 
other by blunt end ligation to yield the sequence 
AGGAGGATCGATCC-coding segment* The start of the coding seg- • 
ment is then located eight bases from the ribosomal binding 
site. 

The sequence, ribosomal binding site-spacer-coding seg- 
ment (human proinsulin) is further modified by the attachment 
of the appropriate restriction linker, depending on the 
desired insertion site^ For example, Eco RI linker is used 
for insertion in the gene coding for B-gal actosidase . In 
contrast to prior results, however, expression does not 
result in production of a fusion prxrtein since the ribosomal 
binding. site linker ^cts to reinitiate translation so that 
the segment cading for human proinsulin is expressed per se. 
The expression product is detected by immunochemical means. 

Example 5 * < 

The ribosomal binding site linker of Example 4, the 
specific purification- segment of Example 3, and the specific 
cleavage linker of Example 1 are combined by blunt end 
ligation to yield a composite linker having the sequence 
AGGAGGATCGATCCATGGATGATGATGATAATGATGATGATGATAAA. Descri bed 
in functional terms, the composite linker has the sequence 
ribosomal binding site-spacer-start codon-purification 
portion-specific cleavage; site-coding segment* The composite 
is further modified by attachment of an EcoRl linker, to 
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facilitate insertion into the site of a plasmid such as 

pBGP 120, described by Polisky, B., et al , supra . Trans- 
formation with the resulting transfer vector permits 
expression of human- proinsulin having a polyanionic N-terminal 
5 portion. The expressdon product is then purified as described 
in Example 3 followed by specific cleavage using enterokinase . 
The combined techniques result in the production of highly 
purified human proinsulin. The principal advantage of the 
combined techniques is due to the fact that, once the 
10 appropriate linkers have been attached to the coding segment, 
expression of the coding segment and specific purification of 
the expression product are accomplished by relatively simple 
procedures which can be carried out without difficulty on a 
large scale. 

15' As a further alternative, the above described composite 

linker can be further modified, prior to the addition of the 
restriction site linkers, by the addition of a sequence cap- 
able of functioning as a promoter, for example, TATGATG, The 
use of such a promoter sequence in combination with the 

20 * linker segments just described makes it possible to. obtain 
expression at a greater v^ariety of insertion sites on the 
transfer vector, including those which are normally silent. 

While the invention has been described in connection 
with specific embodiments thereof, it will be understood that 

25 it is capable of further modifications and this application is 
intended to cover any variations, uses, or adaptations of the 
invention following, in general, the principles of the 
invention and including such departures from the present • 
disclosure as come within known or customary practice within 

30 the art to which the inventio'n pertains and as may be applied 
to the essential features hereinbefore set forth, and as 
follows in the scope of the appended claims. 
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1- A specific cleavage linker comprising a deoxynucleb- 

tide sequence coding for a specific cleavage sequence com- 
prising a sequence o-f one or* more amino acids which is spec- 
ifically recognized and cleavable by enzymatic or chemical 
means. 

2* The specific cleavage linker of claim 1 which also 

contains a deoxynucleotide sequence containing (i) a 

restriction site an-d (ii) be at the 5*-end of said linker, 

n m ' 

said restriction site is recognized and cleavable by a 
restriction endonuclease , b and c miay be any deoxy nucleotide 
and n and m are integers such that n + m = 0,1 or 2. 

3. The specific cleavage linker of claim 1 or 2 wherein 
said linker codes for a specific cleavage sequence specific- 
ally recognized and cleavable by an enzyme selected from the 
group comprising enterokinase, kallikrein B and chymosin. 

4. The specific cleavage linker of claim 1 wherein said 
linker comprises a deoxynucleotide sequence comprising a plus 
strand having the sequence -5 ' - ( GAL ) ( n ) AA J-3 ' or 
-51-CGAL)(n)WGZ-3 S wherein 

L is A, T, C or G, 
J is A or G, 

W is A if Z is A or 6 or W is C if Z 

isAjG, CorT, 
Z is A, T, C or G if W is C or Z is , 
A or G if W is A 
and n refers to the number of the triplet codons GAL in the 
deoxynucleotide sequence and may be 2, 3 or 4, said specific 
cleavage linker codes for the specific cleavage sequence* 
specifically recognized and cleavable by enteroki nase • 

5. The specific cleavage linker of claim 4 wherein L is T 
orG, JisA, WisC and Z Is T. 

6* The. specific cleavage linker of claim 4 wherein said 

linker also contaias a deoxy:nt<cleotide sequence having the 
restriction site for Hind i II endonuclease at the -5»-end. 
1. The specific cleavage linker of claim 6 wherein the 

deoxynucleotide sequence having the restriction site for 
Hin dlll endonuclease comprises a plus strand having the 
sequence 5 * -CCAAGCTTGG-3 ' / 
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8. A specific pu r i Ti c a t i on ■ 1 i n ke r cofnprising a "deoxy- 
nucleotide sequence coding for an amino acid sequence which 
is sel ecti vel y bindable to a solid phase material. 

9. The specific purification linker of claim 8 wherein 
said amino acid sequence is selected from the group comprisin 
a polyanionic amino acid sequence, a polycationic amino acid 
sequence, a hydrophobic amino acid sequence and an immuno- • 
genie peptide. 

10. ^ The specific purification linker of claim 8 wherein 
said linker comprises a deoxyn ucleo tide sequence comprising a 
plus strand having the sequence 

■5 '-CGAL)(m)AAK(GAL) (n)AAJ-3 ' , wherein 

L Is A, T, C or G, 

K is T or C, 

J is A or G and 

m and n refer to the number oT the triplet codons GAL in the 
deoxynucleotide sequence, m may be 1,2, 3 or 4 and n may 
be 2, 3 or 4. - 

11. The specific purification linker of claim 8 wherein 
said amino acid sequence comprises amino acids selected from 
the group consi sting of leucine, isoleucine, valine or phenyl 
alanine ^ : said sequence being 10-50 amino acids in length. 

12. A ribosomal bindiag site linker comprising a first 
deoxynucleotide sequence which is homologi^us to the -3 '-end 
of 165 ribosomal RNA and a set ond deoxynucleotide sequence 
comprising 3-11 deoxynucleotides , said second deoxynucleotide 
sequence is joined to said first deoxynucleotide sequence. 

13. The ribosomal binding site linker of claim 12 wherein 
said linker also contains a termination codon. 

14. The ribosomal binding site linker of claim 12 where- 
in said linker also contafins a promoter segment 

.15. The ribosomal binding site linker of claim 13 wherein 

said linker comprises a deoxynucleotide sequence comprising 
a plus^ strand having the. sequence 

-5»-L(n)TAGGAGGAL(m)-3 » or 
•5 '-L(n)TAGGAGGATCAATTCC-3 ' , wherein 
L is A, T , C or G, n is 0, 1 or 2 and m is 
any integer from 3 to 11,*. n: and m denoting the number of L 



^0 0035384 

nucleotides in the sequence. 

16. The ribosbmal binding site linker of claim 14 where- 
in said linker comprises a deox ynucl eo tide sequence compris- 
ing a plus strand having the sequence 

•5 • -TATJATJAGGAGGAL(m)-3;' , wherein 

J is A or G, L is A, T, C or G and m is any 

integer from 3 to 11 and denotes the number of L deoxynucleo- 

tidesinthesequenee. 

17. A composite linker comprising a first deoxynucleotide 
sequence having a restriction site which is recognized and 
cleavable by a restriction endonuclease, a second deoxynucleo 
tide sequence which is homologous to the 3'-end of 16S ribo- 
somal RNA, a third deoxynucleotide sequence having 3-11. deoxy 
nucleotide's, a fourth, deoxynucleotide sequence having the 
initiation codon, a fifth deoxynucleotide sequence coding 

for an amino acid sequence which is selectively bindable to 
a solid phase material and a sixth deoxynucleotide sequence 
coding for a specific cleavage sequence comprising a sequence 
of one or more amino acids which is speci fically recognized and 
cleavable by enzymatic or chemical means, said first, second, 
third, fourth, fifth and sixth deoxynucleotide sequences are 
joinfed together in the direction of translation. 
18. The composite linker of claim 17 wherein said linker 
comprises a deoxynucleotide sequence comprising a plus strand 
having the sequence 5 • -CCAAGCTTGGAGGAGGATCAATTCCATG 

GALGALGALGALAAKGALGALGALGALAAJ-3 ' or 5 ' -CCAAGCTTGGAGGAGGAT 

AATTCCATGGALGALGALGALAAKGALGALGALGALAAJ-3 ' , wherein 

L is A, T, C or G, 
J is A or G , and 
K is T or C. 
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